EKF-SIRD model algorithm for predicting the coronavirus (COVID-19) spreading dynamics

In this paper, we study the Covid 19 disease profile in the Algerian territory since February 25, 2020 to February 13, 2021. The idea is to develop a decision support system allowing public health decision and policy-makers to have future statistics (the daily prediction of parameters) of the pandemic; and also encourage citizens for conducting health protocols. Many studies applied traditional epidemic models or machine learning models to forecast the evolution of coronavirus epidemic, but the use of such models alone to make the prediction will be less precise. For this purpose, we assume that the spread of the coronavirus is a moving target described by an epidemic model. On the basis of a SIRD model (Susceptible-Infection-Recovery- Death), we applied the EKF algorithm to predict daily all parameters. These predicted parameters will be much beneficial to hospital managers for updating the available means of hospitalization (beds, oxygen concentrator, etc.) in order to reduce the mortality rate and the infected. Simulations carried out reveal that the EKF seems to be more efficient according to the obtained results.

Since its appearance in late December 2019, the new COVID-19 epidemic has spread rapidly across the world. The first cases of COVID-19 were reported in Wuhan, China and this disease then dilated to Europe, North and South America affecting the most of developed countries such as Italy, France, USA, etc. where sporadic cases have been imported via returning travelers from China.
While it has long seemed spared, or almost, by Covid-19, the African continent is not immune to this coronavirus epidemic, it is now affected like the rest of the world, even if the number of deaths remains very limited. A sudden acceleration in the number of cases was observed in July and August and then the contaminations slowed down again. At the start of the year, we are witnessing a "new wave" very visible in the North of the continent, and observable in several large countries of the East and the South, while the health authorities are getting organized for the arrival of the first doses of vaccine.
As of February 13, 2021, the virus had spread to most African countries, with more than 3 734 227 confirmed cases and more than 97 863 reported deaths, including Algeria with 112 461 cases and 2 970 deaths 1 .
For the purpose of control and prevention from the spread of this outbreak of coronavirus, Algerian authorities have implemented various containment measures since March 28, 2020, including traffic restrictions, contact tracing, mandatory face masks in public spaces, entry or exit screening, quarantine and awareness campaigns.
The current outbreak of coronavirus disease (COVID-19) is declared as Public Health Emergency of International concern and a pandemic by the World Health Organization (WHO). This alarming situation has prompted scientists to indulge in studies concerning the transmission dynamics and forecasting of the virus to the most affected countries in the world such as Chine 2,3 , Italy 4 , France 5 , India 6 and then to other countries 7-13 , etc.
These works focus on epidemiological studies whose the main objective is to develop strategies to fight against the spread of the coronavirus and provide guidance to control its transmission dynamic 14,15 . A considerable number of strategies require or involve mathematical models dedicated for studying infection diseases such as SIRD, SIR or SEIR models in different context [16][17][18] (analysis, forecasting the spread and prediction).
The most of these works are intended to the modeling of transmission dynamics with the aim to predict the trend of the epidemic and control the outbreak evolution. In this context, authors of [19][20][21][22][23] proposed mathematical models translating the transmission dynamics of COVID-19 to forecast the number of active cases or to estimate the total number of infected and deaths 7 , while those of 5 develop a strategy based on SIR model to estimate the actual number of people infected and to deduce the IFR (Infected Fatality Ratio). The forecast of future COVID-19 cases has discussed in 24 using regression analysis. www.nature.com/scientificreports/ The estimation of infection, mortality and recovery rates and the basic reproduction number ( R 0 ) are provided in 3 using a SIRD model. Afterwards, Dhillon and all study the trend analysis of mortality and recovery rate considering scenario of most affected countries and Indian States 6 . To mitigate disease transmission, mathematical models introducing a quarantine measures are formulated by Liu and all in 2 and Mandal and all in 25 . Other research works establish the prediction of epidemic peak under the impact of lockdown using an improvised compartment mathematical model 26 (SEIR or SEIRD) i.e., Susceptible ( S)-Exposed ( E)-Infected ( I)-Recovered ( R)-Death ( D ) while, in 27,28 , authors study the forecast of the spread tendency of the COVID-19 through an improved SEIR model. Others methods such as fractional concept, optimization algorithms, Artificial Neural Network… are introduced sometimes for study the growth of cumulative confirmed and cured people and sometimes for formulate the prediction problem as an optimization framework 29 or to estimate the COVID-19 cases 8 .
Additionally, and for containing the epidemic spread in African countries, research works are being conducted in the top infected countries through studies modeling and forecasting of COVID-19. Among which, there have been some comparative studies between the African countries including Algeria [30][31][32] .
However, there are a little peer reviewed papers about epidemiological profile in Algerian territory; these research studies consider traditional epidemic models (SIR, SI, SEIR, …) dedicated to historical data analysis for forecasting the incidence and /or estimation of parameters [33][34][35][36][37] .
In all these developed methodologies, the authors consider mathematical models whose parameters are estimated over a limited period of time. The model once defined is applied in different studies of COVID'19 evolution without taking into account the update of the model parameters and the various measures taken by those responsible.
In this work, we project the engineering techniques used in targets tracking on epidemiology assuming that the spread of the coronavirus is a moving target described by an epidemic model. The idea is to investigate the Kalman filter on SIRD model with the goal to predict the spreading of the Covid 19 and to effectively manage the burden of COVID-19 pandemic in Algeria.
This study shows the disease profile in the Algerian territory since February 25, 2020 to February 13, 2021. Here we are fascinated in applying the extended Kalman filter (EKF) using an epidemic SIRD model to provide a daily prediction of infection, mortality and recovery rates and the basic reproduction number (R 0 ).
In addition, these data are much beneficial to hospital managers and public health decision-makers for updating the available means of hospitalization (beds, oxygen concentrator, etc.) in order to reduce the mortality rate and the infected.
The rest of this paper is organized in 4 sections. "Problem formulation" Section is dedicated the problem formulation and the description of chosen model. The next section details Bayesian approach and more precisely the EKF algorithm used in the context of this work. The application of this technique and the simulation results are discussed in "Simulation results" section and finally, the last section recapitulates concluding remarks of this study and to suggest some outlooks for future works.

Problem formulation
In the literature, the works carried out in the epidemiology study use mathematical models each stratify the dynamics of individuals. The choice of these individuals depends on the problem formulation. The most of the epidemic models for human-to-human transmission rely on the susceptible-infected-recovered (SIR) structure, considered as a fundamental model widely used to delineate various infectious diseases. SIRD model is the standard famous SIR model incorporating an additional compartment: Death class (D). Other structures have emerged to monitor the dynamics of others compartments (classes), such as quarantined susceptible individuals, asymptomatic infectious individuals, isolated infected individuals, exposed individuals, etc. 3,21,22,26,37 For the SIRD model, the population N is divided into sub-population: susceptible (S) , infected (I) , recovered (R) and deceased (D) for all time k , i.e., N = S + I + R + D.
The discrete nonlinear SIRD model is given by: where α(k) , β(k) and γ (k) are the daily infection, daily recovery and daily death rates respectively, see Fig. 1, note that, these rates are optimized daily using the least square method (LSM) as follows: If we accept that S = N, then: If S = N, then: (1) www.nature.com/scientificreports/ I(j) is the total currently infected in the time j (day).
Then the SIRD model becomes: X k is the state vector including susceptible (S) , infected (I) , recovered (R) and deceased (D) , defined as: V k is a zero-mean white noise with covariance Q V . The Jacobian matrice of this model is obtained as: where α(k) , β(k) and γ (k) are the predicted daily infection, predicted daily recovery and predicted daily death rates respectively and are calculated as: www.nature.com/scientificreports/ The predicted daily new cases = the predicted daily new currently infected + the predicted daily new recovered + the predicted daily new deceased.
We suppose that the measurement equation is given daily by: with W k is a a zero-mean white noise with covariance W

Bayesian filtering
In Bayesian approach we attempt to construct the posterior PDF of the state given all measurements. All available information is used to form such PDF. So, this PDF represents complete solution. Let X k , k ∈ N , be the state sequence: where f k is in generally nonlinear function of the previous state X k−1 ∈ R n x , V k−1 ∈ N n v is state noise, u k−1 ∈ R n u is known input, n x , n v et n u are dimensions of the state, process and input noise vectors. let Y k be the measurement: where Y k ∈ R n y , h k is in generally non-linear measurements function, W k ∈ N n w is measurement noise, n y and n w are dimensions of the measurement and measurement noise vectors. We want to find estimate of the X k based on all available measurements at time k (marked as Y 1:k ) by constructing the posterior PDF p(X k , Y 1:k ). It is assumed, that initial PDF p(X 0 |Y 0 ) ≡ p(X 0 ) is available. Posterior PDF can be obtained recursively in two stages, namely prediction and update. Suppose that required PDF p(X k−1 |Y 1:k−1 ) at time step k − 1 is available. Then using the system model, it is possible to obtain the prior PDF of the state at the time step k 38,39 : Prediction step usually deforms, spreads state PDF due to noise. Measurement Y k is available at time step k , so it can be used to update the prior. Using Bayes' rule, we obtain: where the normalizing constant is: In the update Eq. (19), the measurement Y k is used to modify the predicted prior from the previous time step to obtain PDF of the state. Equations (17) and (18) theoretically allow optimal Bayesian solution. But it is only conceptual solution and integrals in these equations are intractable. Solution exists in some restricted cases such as Kalman Filter.
Kalman filter. Kalman filter together with its basic variants are commonly the used tools in statistical signal processing, especially in the context of causal, real-time applications.
There are several approaches in the derivation of the Kalman Filter. We can assume Gaussian distribution of the deriving process and of the initial state. In the next phase, we derive the posterior distribution of the states given the observations, taking the mean of the resulting distributions as the estimation of the state. The second approach combines a recursive weighted least-squares method with special weighting of the previous estimate of the states in the role of additional measurements 40,41 .
Kalman Filter can be used in estimation of the state X k ∈ R n x where posterior PDF is Gaussian in every time step. But in many cases this PDF is not Gaussian and we need to use different approach such as extended Kalman Filter. This method is also labelled as sub-optimal algorithm 42,43 . Extended Kalman filter. Most processes in real life are unfortunately nonlinear, and therefore needs to be linearized before they can be estimated by Kalman filter. www.nature.com/scientificreports/ The extended Kalman filter (EKF) 38,39,[44][45][46] , is the nonlinear genre of the Kalman filter 41,42 which linearizes about an estimate of the current mean and covariance 43,47 . The state transition and measurement models for the extended Kalman filter are taken as: where V (k) is the process noise with zero mean and covariance Q k , and W(k) is the measurement noise with zero mean and covariance k .
The functions f (X(k)) and h(X(k + 1)) are used to compute the predicted state from the previous estimate and predicted measurement from the predicted state, respectively. Instead of applying f (X(k)) and h(X(k + 1)) to the covariance directly, a Jacobian matrix is applied which is evaluated with current predicted states at each time step. Extended Kalman Filter is based upon approximation of the Bayes' rule using linearization.
Discrete-time extended Kalman filter's prediction (time update) and correction (measurement update) equations are given by,

• Prediction (time update)
Predict stage can be described using following equations: where X k+1|k is the predicted state estimate at time k + 1 given measurements up to time k and where P k+1|k is the error covariance matrix.

• Correction (measurement update)
Update stage can be described with the following equations: where ỹ k+1 is innovation term, where S k+1 is the innovation covariance, where K k+1 is the Kalman gain, is update state estimate and is update estimate covariance. Where the Jacobian for state transition and measurement matrices are defined as: Figure 2 Shows the EKF-SIRD Algorithm.

Simulation results
For the application of EKF estimator on coronavirus (covid-19) modelled by the SIRD model, we use the real data provided by the Ministry of Algerian health and the WHO, from February 25, 2020 to February 13, 2021 in our daily predictions.
We consider that the spread of coronavirus is a target that begins its movement from the initial vector: www.nature.com/scientificreports/ where N = 44219385 is the Algerian population number. The mean vector and covariance matrice initialization of the EKF according to a Gaussian law are: The process noise is zero mean, white and with covariance The measurement noise is also zero mean, white, independent of the process noise, and with covariance The trajectories plotted in Fig. 3a, b, c and d are the real data of Algeria and predicted by EKF of total coronavirus cases, total currently infected, total recovered and total deceased respectively.
We observe that the predicted trajectories by the EKF are superposable on the trajectories of real data, which allowed us to say that the EKF is correctly predicted the evolution of these quantities. www.nature.com/scientificreports/ The daily infection rate α(k) , daily recovery rate β(k) and daily death rate γ (k) are optimised by using least square method (LSM) according to Eqs. (5), (6), (7) and (8), and also predicted by the EKF according to Eqs. (11), (12) and (13) as shown in Fig. 4a, b and c.
From these previous predictions (by LMS and by EKF), we can daily predict the basic reproduction number R 0 as shown in Figs. 5 and 6, according to Eq. (31).
We see that generally, the value of the basic reproduction number between the end of April, 2020 and October 15, 2020 is between 1 and 2 except for some disturbances in July, this comes down to the containments measures taken by the country officials (lockdown), including traffic restrictions, contact tracing, mandatory face masks in public spaces.
From the mid of October, 2020 until the end of December, we see some disturbances in the basic reproduction number because of the appearance of the second coronavirus wave, where the daily new coronavirus number has been increased and reached 1133 cases on November 24, 2020.
From the beginning of January, 2021 until February 13, 2021 the basic reproduction number stabilizes between 1 and 1.5.

Conclusion
To track and predict the spread of coronavirus pandemic, we investigated and analysed the outbreak of this Covid-19 disease in Algeria, to help the government and the health ministry take new measures and future decisions to deal with this coronavirus pandemic. For this, we supposed that the coronavirus epidemic is a target modelled by a nonlinear SIRD model and we apply the engineering technique of target tracking (an EKF algorithm) on the coronavirus spreading to predict daily all parameters i.e., susceptible (S), infected (I), recovered (R) and deceased (D).
The novelty of this work is summed up in two points: the daily updating of the model parameters and the application of the extended Kalman filter on this model, which makes the prediction results more precise and the method more reliable.  5  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85  90  95  100  105  110  115  120  125  130  135  140  145  150  155  160 real optimised by LSM predicted by EKF Time step k (day) www.nature.com/scientificreports/ The results showed that according to the data provided by the Ministry of Algerian health and the WHO, from February 25, 2020 to February 13, 2021, the EKF algorithm is successfully predicted the daily coronavirus spreading.